RDLL at CrossLink Anchor Extraction Considering Ambiguity in CLLD
نویسندگان
چکیده
In this paper, we describe our work in NTCIR-10 on the task of cross-lingual link discovery (CLLD). Our proposed method is focused mainly on two aspects in order to accomplish this task: how to find important anchors from an original article in order to crosslink and how to find the correct links to articles in the target language for the original articles. The system first uses online data collected from Japanese Wikipedia articles in order to build a basic crosslink database. These data will be applied in order to identify the anchors and find out the relevant corresponding English articles. We carried out this task in three steps. First, we parsed the Japanese articles and extracted the candidate anchors. Second, we ranked anchors on the basis of the weights of their importance. Third, we determined the correct English articles for each anchor. We marked LMAP 0.151 with manual assessment.
منابع مشابه
DCU at NTCIR-10 Cross-lingual Link Discovery (CrossLink-2) Task
DCU participated in the English to Chinese (C2E) and Chinese to English (C2E) subtasks of the NTCIR 10 CrossLink2 Cross-lingual Link Discovery (CLLD) task. Our strategy for each query involved extracting potential link anchors as n-gram strings, cleaning of potential anchor strings, and anchor expansion and ranking to select a set of anchors for the query. Potential anchors were translated usin...
متن کاملUKP at CrossLink: Anchor Text Translation for Cross-lingual Link Discovery
This paper describes UKP’s participation in the cross-lingual link discovery (CLLD) task at NTCIR-9. The given task is to find valid anchor texts from a new English Wikipedia page and retrieve the corresponding target Wiki pages in Chinese, Japanese, and Korean languages. We have developed a CLLD framework consisting of anchor selection, anchor ranking, anchor translation, and target discovery ...
متن کاملIISR Crosslink Approach at NTCIR 9 CLLD Task
In this paper, we describe our approach to the English-Korean Cross-Lingual Link Discovery (CLLD) task in NTCIR 9. We propose a simple and effective approach to discover the links. Our method comprises preprocessing steps, anchor-target link mapping, and the ranking steps. For discovering the links, we use the English anchor names, the inter-language links, and the translation by the Google Tra...
متن کاملWUST EN-CS Crosslink System at NTCIR-9 CLLD Task
This paper describes our work in NTCIR-9 on the task of Cross-Lingual Link Discovery (Crosslink/CLLD). The work mainly focuses on two aspects to accomplish this task: (1) How to collect useful data for Crosslink and (2) How to use the data correctly and effectively. The system firstly uses online data collecting and text mining in Chinese Wikipedia articles to build the basic Crosslink database...
متن کاملNTHU at NTCIR-10 CrossLink-2: An Approach toward Semantic Features
This paper describes the approaches of NTHU in the NTCIR-10 Cross-Lingual Link Discovery task, also named CrossLink-2. In this task, we aim to discover valuable anchors in Chinese, Japanese or Korean (CJK) articles and to link these anchors to related English Wikipedia pages. To achieve the objective, we do not only depend on Wikipedia’s distinguishing features (e.g. anchor links information an...
متن کامل